Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

نویسندگان

چکیده

We present a method for introducing text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another text). Thus, model can learn from both unlabeled and labeled data, especially when data is abundant. Beyond this, we denoising build robust that deal with normal noisy data. Our system sets new state-of-the-arts on MuST-C En-De, En-Fr, LibriSpeech En-Fr tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

Current speech translation systems integrate (loosely or closely) two main modules: source language speech recognition (ASR) and source-to-target text translation (MT). In these approaches, source language text transcript (as a sequence or as a graph) appears as mandatory to produce a text hypothesis in the target language. In the meantime, deep neural networks have yielded breakthroughs in dif...

متن کامل

End-to-End Automatic Speech Translation of Audiobooks

We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study a midway case where source language transcription is available at training time only. In this case, a single model is trained to decod...

متن کامل

Improving End-to-End Speech Recognition with Policy Learning

Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in end-to-end speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the ...

متن کامل

End-to-End Evaluation in JANUS: A Speech-to-speech Translation System

JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneousconversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end-to-end evaluations the evaluation of the translation capabilities of the system as a whole. The main goal of ou...

متن کامل

An Experimental Methodology for an End-to-End Evaluation in Speech-to-Speech Translation

This paper describes the evaluation methodology used to evaluate the TC-STAR speech-to-speech translation (SST) system and their results from the third year of the project. It follows the results presented in (Hamon et al., 2007), dealing with the first end-to-end evaluation of the project. In this paper, we try to experiment with the methodology and the protocol during the second end-to-end ev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26637